Combining labelled and unlabelled data
نویسندگان
چکیده
There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches on real world problems and to analyse the behaviour of the learning system when using different amount of unlabelled data. In this paper an analysis of the performance of supervised methods enforced by unlabelled data and some semisupervised approaches using different ratios of labelled to unlabelled samples is presented. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results show high variability and the performance of the final classifier is more dependant on how reliable the labelled data samples are rather than use of additional unlabelled data. Semi-supervised clustering utilising both labelled and unlabelled data have been shown to offer most significant improvements when natural clusters are present in the considered problem.
منابع مشابه
Pattern recognition using labelled and unlabelled data
This thesis presents the results of a three year investigation into combining labelled and unlabelled data for data classification. In the present world, there are many fields in which the quantity of data available to workers in that field has increased exponentially over the last few years. This has in part been due to improved methods of automatic data capture and in part due to improved ele...
متن کاملCombining labelled and unlabelled data in the design of pattern classification systems
There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches and to analyse the behaviour of the learning system when using different ratios of labelled to unlabelled data. In this paper various methods for learning from labelled and unlabel...
متن کاملIntimate Learning: A Novel Approach for Combining Labelled and Unlabelled Data
This paper introduces a new bootstrapping method closely related to co-training and scoped-learning. The method is tested on a Web information extraction task of learning course names from web pages in which we use very few labelled items as seed data (10 web pages) and combine with an unlabelled set (174 web pages). The overall performance improved the precision/recall from 3.11%/0.31% for a b...
متن کاملCombining Labelled and Unlabelled Data: A Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition
We address the problem of using partially labelled data, eg large collections were only little data is annotated, for extracting biological entities. Our approach relies on a combination of probabilistic models, which we use to model the generation of entities and their context, and kernel machines, which implement powerful categorisers based on a similarity measure and some labelled data. This...
متن کاملA Labelled Graph Based Multiple Classifier System
In general, classifying graphs with labelled nodes (also known as labelled graphs) is a more difficult task than classifying graphs with unlabelled nodes. In this work, we decompose the labelled graphs into unlabelled subgraphs with respect to the labels, and describe these decomposed subgraphs with the travelling matrices. By utilizing the travelling matrices to calculate the dissimilarity for...
متن کامل